MicroRNA Interaction network in human: implications of clustered microRNA in 

biological pathways and genetic diseases 
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A novel group of small non-coding RNA, known as microRNA (miRNA) is predicted to regulate 
as high as 90% of the coding genes in human. The diversity and abundance of miRNA targets offer 
an enormous level of combinatorial possibilities and suggest that miRNAs and their targets form 
a complex regulatory network. In the present study, we analyzed 711 miRNAs and their 34,525 
predicted targets in the miRBase database (http://microrna.sanger.ac.uk/, version 10), which gen- 
erate a complex bipartite network having numerous numbers of genes forming the hub. Genes 
at the hub (total 9877) are significantly over represented in genes with specific molecular func- 
tions, biological processes and biological pathways as revealed from the analysis using PANTHER 
(http://www.pantherdb.org/genes/). We further construct a miRNA co-target network by linking 
every pair of miRNAs which co-target at least one gene. The weight of the link, which is taken to be 
the number of co-targets of the pair of miRNAs vary widely, and we could erase several links while 
keeping the relevant features of the network intact. The largest connected sub-graph, thus obtained, 
contains 479 miRNAs. More than 75% of the miRNAs deregulated in 15 different diseases collected 
from published data are found to be in this largest sub graph. We further analyze this sub-graph to 
obtain 70 small clusters containing total 330 miRNAs of 479. We identified the biological pathways 
where the co-targeted genes in the clusters are significantly over- represented in comparison to that 
obtained with that are not co-targeted by the miRNAs in the cluster. Using published data, we 
identified that specific clusters of miRNAs are associated with specific diseases by altering particular 
pathways. We propose that instead of single miRNA, clusters of miRNA that co-targets the genes 
are important for the regulation of miRNA in diseases. 



I. INTRODUCTION 

Micro-RNA (miRNA) belongs to a class of small 
non-coding single stranded RNA, approximately 21 nu- 
cleotides long, which negatively regulate gene expres- 
sions. Mature miRNA interacts with the 3' untrans- 
lated regions (3' UTR) of the gene in human and down 
regulate the expression of the target either by degrad- 
ing the mRNA or inhibiting the translation. In some 
cases, increased expressions of the target gene by miR- 
NAs have also been reported (reviewed in [1]). Recent 
experiments show, at least in few specific cases, that the 
mature miRNA can alters the expressions of the genes 
by binding to the coding regions as well as 5' UTR of 
genes [2-4] providing further complex regulation of the 
genes by miRNAs. It has been proposed on the basis of 
theoretical analysis that as many as 30% of genes in the 
human genome may be the target of miRNA [5]. Re- 
cent, estimates predict that as large as 90% human genes 
are targets of miRNA [6]. However, experimental vali- 
dation of such prediction is largely lacking. Function of 
each region of the mature miRNA is not well defined, al- 
though, the seed region (2 nd to 7 th position from the 5' 
end of the mature miRNA) , is the most important region 
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that interacts with 3' UTR for regulation of the target 
genes. The other regions known as extended seed and 
delta seed regions also contribute to the target selection 
[7]. The diversity and abundance of miRNA targets of- 
fer an enormous level of combinatorial possibilities and 
suggest that miRNAs and their targets appear to form a 
complex regulatory network. 

Functions of miRNA is known only for some, although 
deregulation of miRNAs has been shown in number of 
diseases like various types of cancers [8, reviewed in 9] 
cardiovascular development and heart failure [10]. In- 
volvement of miRNAs in various diseases has been re- 
cently reviewed [11]. Studies of various diseases and nor- 
mal cellular processes indicate that miRNAs are involved 
in immune-system [12]; stem cell renewal and develop- 
ment [13, 14]. Using inhibitors of different miRNAs it 
has also been shown that several miRNAs are involved 
in cell death, cell growth and proliferation [15]. The ex- 
tent of modulation of the targets and their influences on 
the biological processes that lead to alteration of cellular 
phenotype varies considerably [16]. In a recent study in 
Caenorhabiditis elegans (C. elegans) where 83% of the 
C. elegans miRNA (total 95 miRNA) are mutated and 
effects of these mutant genes are studied. It is shown 
that majority of miRNA mutations do not result into any 
phenotypic changes, indicating that there are redundan- 
cies of miRNAs that can target a gene. However, 10% of 
miRNA deletion causes clear developmental and morpho- 
logical defects. It has been proposed that there is signif- 



icant functional redundancy among miRNAs or among 
gene pathways regulated by miRNAs [17]. In spite of 
considerable information on the involvement of miRNA 
in numerous biological processes and large number of hu- 
man diseases, the precise information of the targets they 
regulate in the normal processes and diseases remains 
elusive. 

Role of miRNAs in signal transduction pathway has 
been identified in a recent work. By analyzing the in- 
teractions between miRNAs and a human cellular signal- 
ing network, it has been observed that miRNAs specifi- 
cally target positive regulatory motifs, downstream net- 
work components and the highly connected scaffolds in a 
signaling network and genes whose promoter regions in- 
clude a large number of putative transcription factor (TF) 
binding sites [18]. Combinatorial regulation of genes by 
TF and miRNAs provides higher complex programs [19]. 
However, it is not known fully how miRNA-miRNA in- 
teractions regulates the expressions of the genes. Inter- 
actions of miRNA-miRNA and miRNA- TF in regulat- 
ing the expressions of protein coding genes have recently 
been studied. Two databases namely Target Scan and 
PicTar have been used that cataloged 8672 predicted tar- 
gets of 138 miRNA and 9152 predicted targets of 178 
miRNA respectively. There are overlaps ( 80%) in the 
miRNA lists and targets covered in these two databases 
[20]. Abilities of three miRNA namely miR-16, miR- 
34a, miR-106b to alter cell cycle, target expressions and 
apoptosis have been studied. It has been observed that 
miR-16 and miR-34a together block Gl to S transition 
greater than that obtained individually but lesser than 
additive. On the contrary, expression of miR-106b, which 
accelerates Gl to S transition, together with miR-16 or 
miR-34a, reduce the cells in Gl less than that obtained 
with miR-16 or miR-34a alone but higher that obtained 
with miR-106b alone. This result shows that miR-16 and 
miR-34a together exhibit an intermediate cellular phcno- 
type. All these miRNAs alter specific sets of the targets. 
Thus different miRNA may target the different proteins 
in the same pathways [21]. Experimental evidence also 
has been provided recently to show that specific pairs of 
miRNAS together involves in the maintenance of embry- 
onic stem cells [22]. It is important to know whether 
miRNAs interact in a combinatorial fashion, alters spe- 
cific biological pathway(s) and participates in human dis- 
eases. 

In the present communication, using 711 miRNA in 
human and their predicted targets we have performed a 
topological analysis of miRNA network to elucidate how 
miRNA-miRNA interactions regulate the targets. We ob- 
serve that the miRNA are clustered and that some clus- 
ters of miRNA co-target the genes in the specific path- 
ways. Analysis of published experimental data describing 
deregulated miRNA and mRNA in 15 diseases, support 
this notion. 
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miRNAs 




Gene targets 

Figure 1: Figure shows a schematic representation of the 
bipartite connections of miRNAS and their targets. 



II. RESULTS AND DISCUSSIONS 

A. Connectivity in miRNA- gene interaction 
network 

For the analysis of miRNA and their tar- 
gets in human, we use the miRBase database, 
(http://mi crorna.sanger.ac.uk/[ ) which has M = 711 
microRNAs (miRNAs) and N = 34525 gene targets. 
First, a data set is created where the gene versus the 
set of miRNAs targeting that particular gene is listed 
row-wise. A schematic representation of the data set 
as a bi-partite network is shown in Figure 1. The 
blue and red circles there represent the miRNAs and 
the genes respectively. For convenience, both miRNAs 
and genes are given arbitrary identification number 
m = 1, 2, . . . i, . . . M and n — l,2,...j,...N respec- 
tively. A line (or link) is drawn between micro RNA i 
and a gene j, if i targets j. In total there are 676265 
links which connect the miRNAs and their targets. From 
the figure it is evident that the system of miRNA-gene 
form a bi-partite network represented by a (M x N) 
adjacency matrix A with matrix elements, 

. f 1 if miRNA i targets gene j 

lJ 1 otherwise ' ^ ' 

The network is bi-partite, because there is no link be- 
tween two elements within the group of genes or miR- 
NAs, i.e. two genes or two miRNAs are never connected. 
In Figure 2A, we have shown the distribution of tar- 
gets (genes) P g (n) , which is the fraction of genes (shown 
as circles in Figure 1) targeted by n number of miR- 
NAs. If the connections had been random (i.e., if 676265 
links were distributed among 711 miRNAs randomly) we 
would have obtained a normal distribution (blue solid 
line in Figure 2A). It is clear that P g {n) varies signifi- 
cantly from what is expected from a random bi-partite 
graph. P g (n) could be fitted to an exponential distribu- 
tion (shown in the inset of Figure 2A) with a a typical 
scale k* — 20. This implies that, most of the genes are 
targeted by only k < 20 miRNAs. Only a few genes 
are targeted by a large number (k > 20) of miRNAs 
which are termed as the target hubs. The degree dis- 
tribution, being different from a random normal graph, 
implies that there is a high organizational structure in 
the gene-miRNA interaction network. Similar organi- 
zational structure is observed from the distribution of 
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Figure 2 A: Distribution of targets P g {k), which is fraction 

of genes targeted by k miRNAs (red circles) is compared 
with a random network (blue lines). The same plot is shown 
in the inset in semi-logarithmic scale. 



miRNAs P m (k), which is the fraction of miRNAs that 
target k number of genes. We find that 126 miRNA in 
the miRNA hub target > 1067 genes (data not shown). 

Analysis of the genes in the target hub (total 9877, 
Suplimentary table SlA) by PANTHER revealed that 
genes with molecular functions like nucleotide binding, 
transcription factor, receptor and hydrolase are enriched 
compared with the genes in the human genome. Genes 
involved in biological process like apoptosis, cell cy- 
cle, developmental process, nucleic acid metabolism, sig- 
nal transduction etc. are significantly over represented 
among the genes in the target hub. Genes involved in 
cell proliferation and differentiation, cell structure and 
motility and oncogenesis are also significantly decreased. 
Genes in the target hub are enriched in specific path- 
ways like p53 pathway, angiogenesis, in two neurodegen- 
erative disease pathways namely Alzheimers disease and 
Huntingtons disease pathway. Representative result of 
pathways enrichment is shown in the Figure 2B and the 
full analysis is shown in the Supplementary table SIB 
to SID. Results shown in Supplementary table SIB to 
SID and in the Figure 2B indicate that miRNA in combi- 
nation may alter molecular functions, biological processes 
and cellular pathways. This result is similar to that have 
been obtained with lesser number of miRNA (about 180) 
that target approximately 10000 genes [20]. 



B. Micro RNA co-target network 

To probe the detailed structure of the network, we cre- 
ate a co-target network of miRNA-miRNA as follows : if 
two miRNA co-target w (non-zero) number of genes, we 
define that these miRNAs are linked. The weight of the 
link is w. Otherwise, if w = 0, no connection is made be- 
tween these miRNAs. By doing so, we form an undirected 
weighted network of miRNAs. The corresponding adja- 
cent matrix C is symmetric having elements Cy, where 
Cij is the number of genes co-targeted by miRNA i and 
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Figure 2B: Comparisons of various pathways in the human 
genome and among the genes in the target hubs. 
Representative pathways are shown where the genes in 
specific pathways are significantly enriched (filled bars) in 
comparison with that obtained in the human genome (open 
bars) using PANTHER. 
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The diagonal elements Cu , are not well defined. We take 
Cu — 0. Schematic description of this network is shown 
in the Figure 2C. 

Few properties of this co-target network can be checked 
easily. First, it is a "fully connected network", which 
does not have any sub-graph. The distribution of weights 
P g (w) is shown in Figure 2D, which is again compared 
with corresponding random graph. The result indicates 
that the connections are not random and the network 
is highly organized. Again we find that there is a huge 
difference between the maximum wight {Cij) max = 1253 
and the minimum (Cy) m j n = 1, which leads to a con- 
jecture that most links might be irrelevant and can be 
erased. Next, we describe how to get an optimal set of 
miRNAs, by erasing irrelevant links, which could still de- 
scribe all essential features of the co-target network. 

We define that the weak links are those whose weights 
are smaller than a pre-decided cutoff q, and erase them. 
(For example when q = 10, links between (1,6) and 
(710,3) in Figure 2C are to be erased.) The network 
do not remain fully connected anymore and breaks up 
into smaller disconnected sub-graphs. Let the number 
of these disconnected sub graphs be N q , which explic- 
itly depends on q. The largest among these subgraphs, 
named as G, is the important sub-graph. Clearly size 
of G decreases with increase of q. For very large q, G 
has only fewer miRNAs; that might simply the study of 
this network at the cost that some of the functionalities 
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which are lost. Thus we need to optimize q such that all 
irrelevant links are erased, keeping only the optimal set 
of miRNAs in G which are functionally relevent. Note, 
that the cut-off q redefines a new adjacency matrix 

q = ( if dj < q , . 

y \ 1 otherwise ' K ' 

Thus, the number of subgraphs N q is the number of 
diagonal blocks of C q , which can be found by block- 
diagonalizing the matrix C' q . Since it is a fully connected 
graph, initially iVi = 1 and then N q increases with q. 
Figure 2D shows a plot of N q versus q. 

It is evident from Figure 2D that there are three 
regimes, (a) q < q* , (b) q ~ q* and (c) q > q* . Almost all 
the connections are present in regime (a). In this regime 
the connections are quite stiff and the network does not 
break up until a threshold q = q* is reached. By increas- 
ing q one goes to regime (b) where the irrelevant connec- 
tions are already erased and only the relevant connections 
are present. Thus any little change in q changes N q sub- 
stantially. The relevant miRNAs, which cooperatively 
regulate the expressions of the genes to form the co-target 
network are probably found in regime (b). In regime (c), 
q is very large to erase even the relevant connections. 
Only connections which are present in this regime (c) are 
possibly due to chemical similarity, i.e, the miRNAs form 
very small subgroups and seed sequences (position 2 to 7 
in the mature miRNA) of miRNAs within a subgroup are 
identical. This proposition has been verified as follows: 
we take q = 600 and find all the disconnected sub-graphs 
which have two or more miRNAs. The seed sequence of 
miRNAs in every sub-group has been checked. We find 
that the set of miRNAs in each subgroup (except only 
one of 63 subgroups) have identical seed region [see sup- 
plementary table S2]. 

Note that the continuous break-up transition is a dis- 
tinct property of miRNA co-target network, it does not 
occur for a random graph as shown in Figure 2D (blue 
line). To emphasize this point further let us construct a 
random undirected graph R which has total weight W is 
same as that of C, i.e., 

N M N M 

W = Y, E C H = !0354685 = E E ^ ( 4 ) 

2—1 i—1 j=i-\-\ 

Effectively, we need to choose M{M - l)/2 = 252050 
random integers, one for each upper diagonal element of 
C ( total M(M— 1)/2) so that their sum is W. Since the 
random matrix R has to be symmetric (Rij = Rji), the 
lower diagonal elements can be calculated by substitut- 
ing R^ by Rji. In a similar way, by varying the cut-off q 
we find N q for R, which is shown in Figure 2D (as blue 
dashed line). Note that the break up of the random net- 
work into subgraphs in this case occurs discontinuously, 
unlike the miRNA co-target network. This result indi- 
cates that miRNA co-target network is structurally rich. 
Details topological structure of the network are now be- 
ing studied. 




Figure 2C: Representation of the miRNA network. Here 

miRNAs (1 to 711) are shown in blue circles. In this 
example C14 = 17 = C41 and C23 = = 6*3,2 , which says 
that miRNA 1 and 4 co-target 17 genes whereas miRNA 2 
and 3 do not have any common target. 



The analysis reveals that the optimally connected 
graph (the sub-graph having maximum number of rel- 
evant miRNA nodes) which still posses the properties of 
the network can be found if we work at q = q*. To 
find q* , we differentiate q numerically with respect to 
q. The maximum of -j^N q (as shown in the Figure 2D) 
corresponds to q* = 103. In this case, at q* = 103, 
N q = 166 and the largest connected sub graph G is found 
to contain only 479 nodes (for full list of miRNAs in 166 
sub graphs see the supplementary table S3). Thus, these 
nodes or miRNAs form the optimal set of miRNA which 
co-regulate the gene expressions. To determined how 
miRNAs are organized within the sub graph G, we in- 
crease the cutoff q beyond q* and follow how G as breaks 
up into smaller clusters of miRNAs. Clusters in this con- 
text are defined as the subgraph which has more than one 
miRNAs. It is expected from Figure 2D that number of 
such clusters would not change much once we reach re- 
gion (c) where -§^N q ~ which starts approximately at 
q ss 160. Thus, we take q = 160 and collect the clusters 
(the sub graphs of S which have two or more miRNAs). 
There are in total 70 such clusters containing 330 miR- 
NAs. Thus it is reasonable to assume that the miRNAs 
within G are organized as 70 clusters, 169 independent 
miRNAs and interactions between them. These clusters 
form the basic units of the miRNA interaction network. 
Figure 3 shows the interaction network of miRNA of 70 
clusters. In the present study, we clustered the miRNAs 
on the basis of their ability to co-target the same genes, 
however, such co-targeting could be due to similarity in 
the seed sequences. We checked that 18 out of 70 clus- 
ters might be due to seed similarity. About 50% of the 
miRNA in human are organized within 10Kb of genomic 
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q* q 
Figure 2D: (a) Number of disconnected sub-graphs N q 
when links with weight < q are erased, (b) corresponds to 
the same, for a random network. Plot of -^N q (which is 
scaled by a factor 20 for better visibility) shows a peak at 
q* = 103. 

sequence. Whether organizationally clustered miRNAs 
are under same regulatory sequence and expressed to- 
gether remains unknown. Eleven out of 70 clusters might 
be due to the genomic organization(Supplementary table 
S4). 

We extended our analysis to study miRNA- 
miRNA interaction networks in Caenorhabditis 
elegans with 136 miRNAs and their 20366 targets 
(http://microrna.sanger.ac.uk/cgi-bin/targets/v5/genome. 
The optimal value of q in this network is q* = 23. The 
largest connected sub graph at q — q* contains 98 
miRNAs. Most surprisingly, we find that both the 
species show universal features near the breakdown 
transition, similar to that shown for human in Figure 2D 
(data not shown). Reasons for such similarity in break 
down of miRNA-miRNA interaction network require 
further studies. 

It is also interesting to note that the clusters of miR- 
NAs form a interaction network through their common 
targets (shown in Figure 3 and discussed in section 4.4). 
Biological significance of this network in under investiga- 
tion. 



C. Co-targeted genes in miRNA clusters may 
involve in specific pathways 

We need to know the biological relevance of the miRNA 
clusters better as they form the basic units of biologi- 
cal interactions for the regulation of the targets. The 
smallest cluster in subgraph S, of course, contains two 
miRNAs, while the largest cluster contains 47 miRNAs 
(Supplementary table S4). To explore whether the miR- 
NAs in clusters target genes in specific pathways we de- 
termine if the co-targeted genes in a miRNA cluster is 
involved in any specific pathway. To do so, we deter- 
mine the common targets of the miRNAs in a cluster 
taking two miRNA at a time, and then asked whether 




Figure 3: Interaction network of 70 clusters containing 330 
miRNAs ( links with weight < 2000 are erased). 

the co-targeted genes are enriched for a particular path- 
way(s) compared to the targets which are not co-targeted 
by the miRNA in that cluster. For example, there are 
three miRNAs in cluster no. 25 (supplementary ta- 
ble S4), namely miR-603, miR-521 and miR-523, which 
$lo not have common seed sequence and in combination 
[miR-603 and miR-521 (ni) and miR-521 and miR-523 
(712) and mir-523 and miR-603 (^3)] target 449 genes 
[(ni + ri2 + W3)]. These three miRNAs together target 
2309 genes. The number of genes not targeted by the 
miRNAs in this cluster is 1860. Using the PANTHER, 
we observed that common targeted genes (449) are signif- 
icantly [p < 0.05) enriched in axon guidance mediated by 
semamorphins (P00007), cell cycle (P00013), DNA repli- 
cation (P00017) and FGF signaling (P00021) pathways. 
This analysis further reveals that out of 70 clusters, co- 
targeted genes in 47 clusters are significantly enriched at 
least one biological pathway (total biological pathways in 
human at PANTHER database is 153). In several cases, 
the co-targeted genes are over represented in more than 
one pathway (supplementary table S4). Significant en- 
hancement of the genes co-targeted by the miRNAs in 
a cluster indicate that these clustered miRNAs regulate 
genes in specific pathway. 

D. Involvement of clustered miRNAs in human 
diseases 

Alterations of several miRNAs have been observed ex- 
perimentally in about eighty diseases [25]. To probe 
whether the clustered miRNAs obtained above are as- 
sociated with any disease, we retrieved data from pub- 
lished papers, where deregulated miRNAs in diseases are 
reported. We collected data for 15 diseases where dereg- 
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ulation of miRNAs are reported (Supplementary table 
S5, see supplementary text for the references). These 
diseases include autism (number of deregulated miRNA 
31), schizophrenia (22), diabetes (glucose homeostasis) 
(19), ovarian cancer (54), AML (38), breast cancer (28), 
colon cancer (19), lung cancer (39), pancreas cancer (56), 
prostate cancer (48), stomach cancer (25), HNC (59), 
thyroid cancer (51), CLL (53) and glioblastoma (11) 
(Supplementary table S5). We note that total number 
of miRNAs studied in these reports are not available for 
all the cases, making it difficult to assign the miRNAs 
which are not changed in these diseases. Among the al- 
tered miRNAs, number of miRNAs in the largest cluster 
G (479 miRNA) with q=103, and in the 70 clusters (S) 
derived from the G with q = 160 is shown in the TABLE 
I. It is evident from the table that for some diseases, the 
numbers of miRNAs in G are significantly higher than 
one expects from random distributions of the miRNAs 
in clusters. For example, in prostate cancer, 48 miR- 
NAs are deregulated, out of which 41 miRNAs belong 
to the cluster G (479 miRNA). The expected number of 
miRNA in this cluster is 32 (479 x 48/711), which is sig- 
nificantly lower from the observed value of 41 miRNA. 
It is interesting to note that when the largest cluster G 
was broken into 70 inter-connected clusters (S), the level 
of significances of the miRNAs among 70 clusters than 
the expected values (shown in TABLE I) are increased 
considerably. This trend further justified the breaking of 
the largest cluster/sub-graph G into smaller ones. 

The miRNAs deregulated in the diseases are dis- 
tributed among the 70 clusters in S and compared with 
that was expected from random distributions. For ex- 
ample, out of 28 miRNA deregulated in breast cancer, 
18 miRNAs are distributed in 11 clusters (cluster # 3, 
16, 20, 35, 37, 51, 57, 63, 64, 67 and 68, as shown in the 
supplementary table S4) among the 70 clusters. Among 
these 11 clusters, where 18 deregulated miRNAs are dis- 
tributed, 8 clusters (cluster # 3, 16, 20, 35, 37, 51, 57 
and 64, see supplementary table S4 for the clusters) have 
significantly higher number of miRNA than that one ex- 
pects from random distribution. Similarly, out of 22 
miRNA deregulated in schizophrenia 18 miRNA in S are 
distributed in 11 clusters. Among these 11 clusters only 
8 clusters have significantly higher number of miRNA 
than one expects. Distributions of miRNAs in differ- 
ent clusters are significantly different from that expected 
from random distributions. The representative result is 
shown in the Figure 4 and the detailed result for all these 
15 diseases are shown in the supplementary Figure SF1. 
This result shows that significant numbers of deregulated 
miRNAs in the diseases studied in this investigation be- 
long to specific clusters. Only the clusters, which harbor 
significantly higher number of miRNA over the random 
distributions, are considered for further analysis (see next 
section). Common targeted genes in different clusters are 
significantly enriched in different pathways as shown in 
the earlier section above. 

Common targeted genes of the clusters 4, 55, 57 and 64 



are enhanced in Huntingtons disease pathway (P00029). 
Cluster 4 consists of miR-423-3p and miR-24. This clus- 
ter 4 contains significantly more deregulated miRNA in 
schizophrenia, colon cancer, pancreatic cancer, prostate 
cancer, stomach cancer, head and neck cancer and thy- 
roid cancer (supplementary Figure SF1) than expected 
from the random distributions. This result indicates that 
Huntingtons disease pathway might be deregulated in 
these diseases. To search further if there is any exper- 
imental proof for this proposition, we searched Hunting- 
ton disease and Schizophrenia in pubmed. More than 300 
references are found to contain both the terms. Evidence 
that schizophrenia-like symptoms in Huntingtons disease 
pedigree indicate that at molecular level the Huntingtons 
disease pathway and schizophrenia might have some over- 
lap (Corra et al., 2006). In Huntingtons disease (HD), 
accelerated loss of striatal neurons by apoptotic death is 
the major cellular event (Gil and Rego, 2008), while in 
cancer cell death is prevented in general; apoptosis thus 
might play opposite role in HD and cancer. Epidemiolog- 
ical study indicates that the prevalence of cancer among 
HD patients are less than that observed among individu- 
als without HD (Sorensen et al., 1999). Knock out of p53 
gene, a tumor suppressor gene and mutated in 50% of 
the cancers, induces tumors in mice and decrease the life 
span of the mice. Recently, it has been shown that the 
expression of the mutant allele of HTT gene (expansion 
of CAG repeats beyond 36 at the exonl causes HD) pro- 
longed the life of p53 knocked out mice. This observation 
has been explained by the enhanced apoptosis induced 
by the mutant HTT, which is likely to prevent cancer 
(Ryan and Scrable, 2008). This result further indicates 
that Huntington disease pathway may be altered in can- 
cer. Our result that specific cluster of miRNA target this 
pathway indicate that in cancer as well as schizophrenia 
this pathway might be altered. 

We further tested whether specific clusters of miRNA 
are involved in specific disease. We collected the mRNA 
expression data from the published literature for autism 
(number of deregulated mRNA 795), schizophrenia (146) 
and diabetes (189) [supplementary table S6 and supple- 
mentary text]. Similar data for ovarian cancer (6219), 
AML (5440), breast cancer (6100), colon cancer (932), 
lung cancer (2747), pancreas cancer (6950), prostate can- 
cer (4410), stomach cancer (589), HNC (5468), thyroid 
cancer (3092), CLL (1967) and glioblastoma (6528) was 
taken from ONCOMINE ( |http:/ /www. onco mine.org/) 
and determined whether deregulated genes are enriched 
in specific pathway (s). Detailed result is shown in the 
supplementary table S6. It is to be noted that only a 
subset of these deregulated genes are likely to be dereg- 
ulated by miRNAs. We compared the pathways that 
are enriched for the deregulated genes in comparison to 
that obtained among genes coded by human genom. in 
microarray data with that are obtained for deregulated 
miRNAs in these diseases and significantly enriched in 
the clusters as described above. As mentioned above, in 
breast cancer, 28 miRNAs are deregulated; 18 miRNA 
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TABLE I: Significant number of miRNAs associated with any particular disease (known from published data) are present in 
G. Even more significantly they belong to the subgroup group S consisting of cluster forming miRNAs. 



Disease 


No of miRNAs 
(from published data) 


No of miRNA in G 
(expected) [p-value] 


No of miRNA in 5" 
(expected) [p-value] 


Autism 


31 


26 (21) [0.050] 


20 (14) [0.043] 


Schizophrenia 


22 


19 (15) [0.057] 


18 (10) [0.001] 


Diabetes 


19 


13 (13) [0.922] 


9 (9) [0.933] 


Overian Cancer 


54 


41 (36) [0.180] 


32 (25) [0.058] 


AML 


38 


29 (26) [0.240] 


23 (18) [0.081] 


Breast Cancer 


28 


23 (19) [0.095] 


18 (13) [0.058] 


Colon Cancer 


19 


13 (13) [0.922] 


10 (9) [0.587] 


Lung Cancer 


39 


28 (26) [0.556] 


20 (18) [0.542] 


Pancreas Cancer 


56 


41 (38) [0.351] 


35 (26) [0.016] 


Prostate Cancer 


48 


41 (32) [0.008] 


38 (22) [0.000] 


Stomach Cancer 


25 


18 (17) [0.621] 


15 (12) [0.173] 


HNC 


59 


50 (40) [0.004] 


39 (27) [0.002] 


Thyroid Cancer 


51 


40 (34) [0.092] 


32 (24) [0.019] 


CLL 


53 


44 (36) [0.015] 


36 (25) [0.002] 


Glioblastoma 


11 


9 (7) [0.307] 


9 (5) [0.019] 



AML: Acute Myeloid Leukemia, HNC : Head and Nee 



are distributed in 11 clusters among the 70 clusters of 
which 8 clusters harbors miRNAs significantly over the 
random distribution. Common targeted genes in these 
8 clusters are over represented in 12 unique biological 
pathways (total 13 pathways, supplementary table S7). 
We then compared the pathways that are enriched for 
the deregulated genes and observed that 10 unique path- 
ways are common. Common pathways that are enriched 
in the specific clusters of miRNAs and deregulated genes 
in diseases further indicate that the specific clusters of 
miRNA are likely to target specific pathway(s) and in- 
volve in diseases. 

Significance of "co-targeting the genes" by miRNA is 
not clear and requires further investigations. Limited ex- 
perimental evidences indicate that miRNA influences the 
expressions of the targets moderately (Lim et al., 2005, 
Selbach et al., 2008) and proposed to be involved in fine- 
tuning of gene expressions (Flynt and Lai, 2008). Thus, 
in combination, the level of expression of target genes of 
miRNA might reduce further the gene expression com- 
pared to that could be obtained with a single miRNA. 
Combinational regulation of target genes by miRNAs 
might be complicated by the location of the target sites 
at 3' UTR as well as the tissue specific expressions of the 
miRNAs. There are overlapping or very closely spaced 
miRNA target sites for a particular target gene. Such 
target gene might not have additional effects of the sec- 
ond miRNA due to physical hindrance of the target site 
by binding of the other miRNA at the near by or over- 
lapping site. The same gene may be targeted by different 
miRNA, when the miRNA expression is regulated at the 
tissue level. 

In our analysis seventy clusters of miRNA contains all 
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Figure 4: Distribution of deregulated miRNAs in breast 
cancer (29). 19 out of 29 miRNAs are distributed in 11 
clusters. Significant increase is found only in 8 cluster ( # 
3,16,20,35,37,51,57,64). 



together 330 miRNA and each cluster contains has than 2 
miRNA. Out of these 70 clusters, 18 clusters could arise 
due to the seed similarities and 11 clusters could arise 
for genomic organization (co-localized). Among these 
clusters, 4 clusters are common for the seed similarity 
and co-localization (Supplementary table S4). It is in- 
teresting to note that out of 18 clusters, which could 
arise due to seed sequence similarity, only 5 clusters are 
associated with pathways, which are enriched with co- 
targeted genes. Two clusters (#26 and #31) that could 
arise from seed sequence similarity and co localization, 
also associated with co-targeted genes enriched in spe- 
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cific pathways. This result indicates that in the major- 
ity of cases, clustered miRNA did not arise due to seed 
sequence similarity. Co-targeted genes in the miRNA 
cluster and enriched in specific pathways for 5 clusters 
could arise from co-localization of the miRNAs in the 
genomic regions. These miRNAs, which are co local- 
ized in the same chromosomal region (with in 10Kb up- 
stream and down stream of the pre-miRNA) might be 
regulated by the same promoter regions and co-expressed 
together. In a recent study, miRNAs hsa-mir-92a-l,hsa- 
mir-20a, hsa-mir-18a,hsa-mir-17,hsa-mir-19b-l and hsa- 
mir-19a, which are co localized on chromosome 13 within 
lOkb, have been shown to act in a combinatorial fashion. 
The authors explained that in combination, the miRNA 
down regulated to the measurable extent, while individ- 
ual miRNA was unable to down regulate the target suffi- 
ciently [21]. This result also support the notion that miR- 
NAs in combination lowers the expressions than an indi- 
vidual miRNA. In the majority (47/70) of the miRNA 
clusters, co-targeted genes are enriched in at least one 
pathway. Out of 153 pathways described in the PAN- 
THER (total number of genes 25431, total number path- 
way hit in 7151, thus one gene is involved in more than 
one pathways), 105 distinct pathways are enriched with 
the co-targeted genes in 47 clusters. Among these 47 
clusters, 28 clusters contain significantly excess deregu- 
lated miRNAs in diseases. Co-targctcd genes in these 28 
clusters are enriched in 72 distinct pathways. Same path- 
way appeared in multiple times in different diseases, alto- 
gether 72 pathways appeared 359 times in diseases. For 
example, Huntingtons disease pathway (P00029) appears 
19 times in 15 diseases indicating the importance of this 
pathway in several diseases. The miRNA clusters where 
co-targeted genes are enriched in this particular path- 
way and also harbors significantly higher deregulated 
miRNA in diseases are 4 (423-3p, miR-24), 33 (miR- 
149, miR-892b), 52 (miR-508-5p, miR-516a-3p, miR-198, 
miR-520a-5p, miR-517*, miR-525-5p, miR-516b, miR- 
518c*, miR-518e*, miR-518d-5p, miR-518f*), 55 (miR- 
373*, miR-616*, miR-888), 64 (miR-331-3p, miR-146b- 
3p, miR-18b*, miR-18a*, miR-324-5p, miR-874, miR- 
324-3p, miR-lOa, miR-lOb). These miRNAs are likely 
to target 172 genes in Huntington disease pathway and 
participates in the diseases discussed in this manuscript. 
It is interesting to mention, although significance if any 
remains unknown, that Huntingtons disease pathway was 
also over represented among the genes at the target hub 
as described in earlier section. Experimental verification 
to substantiate of the contention that miRNAs in the 
clusters together alter this pathway is necessary to con- 
firm the prediction. 



III. CONCLUSION 

In conclusion, we constructed a miRNA-miRNA 
weighted interaction network using 711 miRNAs and 
their predicted targets. A novel method was applied to 



break up the network into smaller sub graphs (clusters). 
We further extended our studies to show that specific 
clusters of miRNA might be involved in specific pathways 
that are known/predicted in several diseases. We pro- 
pose that instead of a single miRNA, a group of miRNA 
together target genes in specific pathway and the interac- 
tions of these clusters are likely to be involved in diseases. 

IV. METHOD 
A. Data mining 

All the miRNA and the predicted targets are down- 
loaded from miRBase (http://microrna.sanger.ac.uk/, 
version 10). In version 10 of this database, there are 711 
miRNAs coded by 533 genomic loci. This is because, 
double stranded miRNA produced by DICER and other 
protein is separated by helicase, and in some cases 
both the strands can act as the mature single stranded 
miRNA. In general, the expression of one of the strand is 
more than the others. The strand, which expresses less, 
is denoted by mir*. Predicted numbers of transcript 
(mRNA) that are targeted by this 711 miRNA is 34525. 
It is to be noted that total number of human genes in 
PANTHER is 25431. Thus, it is likely that the some of 
the targets of miRNA in the Sanger database may actu- 
ally be the isoform of the genes. All the predicted targets 
of miRN A in m iRBase are ident ified with the ENSEM- 
BLE ID ( |http: / / www.ensembl.org/j ) ■ For analysis with 
different data bases, the IDs are converted into NCBI 
gene IDs by using the site Clone/Gene Id Converter 
(http://idconverter.bioinfo.cnio.es/) and Ensemble 
Genome Browser ( [http://www.ensembl.org/index.htmll ). 

Genes which are down regulated or up regulated 
in various cancers are downloaded from ONCOMINE 
(http://www.oncomine.org/). For other diseases, we col- 
lected the deregulated genes from various published data. 
ONCOMINE provides the gene names/symbols. These 
gene names are converted into the NCBI gene Id (NM 
numbers) by using the site Clone/Gene Id Converter 
(http://idconverter.bioinfo.cnio.es/) and by using vari- 
ous web sites as mentioned above. The deregulated miR- 
NAs in different diseases are also collected from the dif- 
ferent published data. Full references are provided in the 
supplementary text. 

B. Formation of network 

We use in house perl scripts to convert the dataset con- 
taining 711 miRNAs and their targets into adjacency ma- 
trix Aij (see text) by giving arbitrary, but unique, iden- 
tification numbers to genes as i = 1,2 ... N and miRNAs 
as j = 1,2 . . . M respectively. Ay may take value 1 if 
gene i contains a predicted recognition site for miRNA j 
in the 3' UTR or otherwise Aij — 0. Corresponding net- 
work is shown in Figure 1, where all the genes (miRNAs) 
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are represented as rcd(bluc) circles and a line (joining 
blue circle j red circlei) is drawn if Aij = 1. 

Further, matrix A is used for constructing a miRNA 
co-target network. Two miRNAs j and k are said to 
be connected if they have at least one common target. 
Weight of the connection or link is defined to be Cjk, 
which is simply the number of common targets of miRNA 
j and k. Matrix C has dimesion 711 x 711, which can be 
constructed from A as C = A T A, where transpose of A 
is defined as (A T )ji = A^. Or in otherwords, 

^7* = ^ y ^-ij ■ (5) 



C. Clustering of miRNAs in miRNA co-target 
network 

Since the miRNA co-target network is fully connected 
with large number of links having very small weights, we 
would like to get rid of these links and find sub-graph 
which contains only relevent miRNAs. This can be done 
by erasing the weak links, say the links which has weight 
less than a cutoff q. The network, then breaks up into 
several disconnected sub-graphs, in total N q . Thus, N q 
is just the number of diagonal blocks of matrix C, which 
was calculated using an in house C program. N q changes 
marginally as q is increased and then a rapid change oc- 
curs at q — q* . It is assumed that the network breaks 
rapidly as the relevent (or most significant) links are re- 
moved. To find q* , we differentiate N q with respect to 
q numerically and find the peak position, which corre- 
sponds to the q value where the change is maximum. 
Now we fix q = q* and find the largest sub-graph G 
among N q * subgraphs. G is considered to be the rele- 
vant. To know, how the miRNAs are arranged within G 
we further increase q to 160 (see text for the explanation 
on why we choose q = 160) and collect the subgraphs of 
G. The sub-graph G containing more than one miRNA 
are called clusters. 



D. Network of miRNA clusters 

To get a network of miRNA clusters, first we get the 
weight W mn of the link between cluster m and n. Let 
cluster m (n) has N m ( N n ) miRNAs. We add the 
weights(no of co-targets) of every pair miRNA, fromed 
by taking one miRNA from cluster m and the other from 
cluster n, and define the sum to be the weight of the link. 
Thus, W forms a (70 x 70) matrix with elements 

The diagonal elements W nn are non-zero and it indicates 
the strength of the cluster which is the total number of 
pair-wise co-targets of the cluster n. There is no natural 
cutoff on the weights as they show a scale free distribution 



(data not shown). We take an arbitrary cutoff 2000 to 
draw the network, i.e, a link is drawn between cluster m 
and n only if W mn > 2000. 



E. Co-targeted genes in miRNA clusters 

For the identification of target genes in a specific clus- 
ter, we first find the co-targeted genes pair wise in a clus- 
ter. For example, if there are three miRNA like miR-1, 
miR-2 and miR-3 (hypothetical) are in a cluster, the pre- 
dicted common targets of the miRNAs are those, which 
are common for miR-1 and miR-2 (fix), miR-2 and miR- 
3 (V12), miR-1 and miR-3 (TI3). The pathway(s) where 
these common targets (n = n\ + ri2 + TI3) belong are ob- 
tained from the PANTHER. This result was compared 
with that obtained with those targets, which are not com- 
mon (m—n in total, where m is the total predicted targets 
of miR-1, miR-2 and miR-3). 



F. Classification of the genes in target hub, 
co-targeted by miRNA in clusters, deregulated in 
diseases using PANTHER and enhancement analysis 

In PANTHER (Protein Analysis Through Evolution- 
ary Relationships), genes are classified on the basis 
of their functions using published experimental obser- 
vations and evolutionary relationships; in the absence 
of direct experimental evidence. Proteins are classi- 
fied by expert biologists and categorized by molecular 
function and biological process and biological pathways 
(http://www.pantherdb.org/). In miRNA target search, 
targets are identified with the Ensemble IDs that con- 
tains several isoforms of the genes. We omitted these iso- 
forms as well as other Ensemble IDs for which we are un- 
able to retrieve the Gene symbols for PANTHER analy- 
sis. To identify, whether genes in target hub, co-targeted 
by miRNA in clusters and deregulated in diseases when 
classified on the basis of molecular functions, biological 
processes and pathways are over represented (enhanced) 
in particular category in comparison with that obtained 
in the human genome we analyzed the genes using the 
PANTHER. If the fraction of genes in the test samples 
are significantly enriched in a particular category over 
that obtained in the human genome, we consider the 
specific category is involved/associated with the test sam- 
ples. For example, if the co-targeted genes in a particular 
cluster are significantly over represented in Wnt signal- 
ing pathway (PANTHER ID P00057), we consider that 
this cluster was associated with this pathway. 

Deregulated expressions of genes observed in several 
diseases discussed in this manuscript which are taken 
from published literature are analyzed using PANRHER. 
Significant (p values less than 0.05) increase in genes in 
specific pathways in comparison with that of the human 
genom are identified. 
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G. Correlation of miRNAs in clusters and diseases 

From the published literature we collected 15 diseases 
where the deregulation of several miRNAs are reported. 
We then ask: what fraction these reported miRNAs, say 
n in total, are distributed among the 70 different clus- 
ters (obtained at q = 160, see text for details). This 
fraction is then compared with the random distribution 
of n miRNAs among 70 clusters. Those clusters hav- 
ing miRNAs significantly larger (with p values less than 
0.05) than that of a random distribution are considered 
relevant for this particular disease. Significantly enriched 
genes in pathways, identified from common miRNA tar- 
geted genes in these relevant clusters, are compared with 
the pathways of significantly enriched genes observed in 
deregulated mRNAs (from micro-array data). 

H. Comparisons of the pathways between those 
observed in common targeted genes in clusters and 
that of in deregulated mRNA in diseases 

From ONCOMINE and published literature we col- 
lected the deregulated genes in 15 diseases and ana- 
lyzed using PANTHER. Significantly altered pathways 
are identified. These are then compared with the path- 



ways obtained in common targeted genes in clusters. 



I. Statistical analysis 

Calculation of p value can be best described with an 
example. If n — 11 is the number of miRNA present 
in a cluster among total N = 330 miRNA, and for 
a particular disease there are only m = 2 miRNAs 
in that cluster among total M — 30 miRNAs, then 
X 2 = {mN/M - n) 2 [l/n + l/(N - n)}, which is 11.38 
in this example, p value is the probability that x > \ 2 in 

2 

chi-square distribution Q(x). Thus, p = 1 — J Q(x)dx 
which can be readily integrated using Mathematica or 
by reading it from a Table. In this example p = 0.00074. 
Note that p < .050 is equivalent to x 2 > 3.841. 
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VI. SUPPLEMENTARY MATTERIAL 

Kindly E-mail PKM (pk.mohanty@saha.ac.in) to 

get the Supplementary Matterial. 

A. Supplementary tables 

Table SI: List of the genes that targeted by more than 
20 miRNAs (target hub) [SI A]. Genes at target hub are 
classified using PANTHER according to molecular func- 
tions, biological processes and biological pathways and 



compared with that obtained with the human genome 
(SI B, SIC and SID). 

Table S2: List of the miRNAs that have same seed re- 
gions (position 2 to 7 in the mature miRNA) nucleotides 
of the the mature miRNAs. 

Table S3: List of miRNAs in 166 sub graphs which 
are found at q* = 103. 

Table S4: List of 70 clusters and the miRNAs be- 
longing to each cluster, seed sequence similarity, genomic 
organization (only those which are clustered together 
within lOkb genomic region are only noted). In a clus- 
ter, the common targets for each pair of miRNA is deter- 
mined and added. Biological pathways for the common 
targets in comparisons with the total targets that are not 
common are compared. Only the pathways that are sig- 
nificantly (less than equal to 0.05) enriched in the targets 
are shown. 

Table S5: Experimentally observed deregulated miR- 
NAs in 15 different diseases are shown. For references of 
the sources of these miRNA please see the supplementary 
text. 

Table S6: Analysis of deregulated genes in 15 differ- 
ent diseases is shown. For references of the sources of the 
experimentally determined deregulated genes please see 
the supplementary text. 

Table S7: Significantly enhanced genes in pathways 
for altered mRNA is listed for 15 different diseases. Those 
pathways which are common to the pathways predicted 
from the miRNA clusters are marked (in red). 



B. Supplementary Figures 

SF1: Distributions of various experimentally deter- 
mined deregulated miRNAs among 70 clusters of miR- 
NAs 



C. Supplementary Text 

References of the literature from where we have col- 
lated the deregulated miRNA and genes. 



